Genetic nomenclature for Drosophila melanogaster
Table of Contents
IntroductionThe rules for the genetic nomenclature of Drosophila melanogaster have evolved over the last 85 years or so. This document is a statement of these rules, as adopted by FlyBase. These rules are based on those published in Lindsley and Zimm (1992), The genome of Drosophila melanogaster (Academic Press). This document is a guide to the nomenclature of Drosophila melanogaster. Although much of the existing nomenclature conforms to these rules, some does not. Past practice, and the continued existence of names and conventions that clearly flout these rules (even in FlyBase), is not an excuse for bad future practice. Now that the Drosophila database is kept in electronic form, a consistent and non-redundant nomenclature is of special importance. The nomenclature now used by FlyBase will evolve towards these standards. For internal reasons FlyBase sometimes differs from or extends current nomenclatural standards. These differences or extensions are explained in this document. Advice on nomenclature can be obtained from FlyBase:
FlyBase, Biological Laboratories,
Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA. or via the contact FlyBase form. In particular, the FlyBase consortium welcomes the opportunity to give advice on the naming of genes, alleles, aberrations and transgene constructs. We will undertake checks for users, so as to avoid conflicts of names and symbols. 1. Gene names and symbols1.1. Names. Gene names must be concise. They should allude to the gene's function, mutant phenotype or other relevant characteristic. The name must be unique and not have been used previously for a Drosophila gene (see paragraph 9). The name should be inoffensive. A gene can have only one valid FlyBase name and symbol. All other published symbols for a gene are recorded in FlyBase as synonyms (see section 9. Valid Symbols & Synonyms below for an explanation of how valid symbols are determined). On occasion FlyBase is wholly unable, on the basis of a publication, to assign any meaningful name to a gene, or putative gene. Then, FlyBase will give the gene the name and symbol anon-, with some distinguishing suffix. If and when further information becomes available this name will be changed to something more meaningful, keeping the anon- name stored as a synonym. 1.1.1. Case of initial letter. The name begins with a lowercase letter when the gene is named for a mutant phenotype recessive to the wild-type in a normal diploid. The name begins with an uppercase letter when the gene is named for a mutant phenotype that is dominant to the wild-type in a normal diploid. Genes named after a protein product or other molecular feature begin with an uppercase letter. 1.1.2. Genes named for RNAs. Genes for tRNAs have names of the form tRNA:XN:m, where X is the 1-letter amino-acid code (in upper-case) (IUPAC-IUB, 1969, J. Biol. Chem. 243(13): 3557--3559), N is a number signifying the particular isoform and m is, preferably, a cytogenetic map position followed by a lower case letter, e.g., tRNA:S7:23Ea, tRNA:S7:23Eb for the two different serine-7 tRNA genes that map to polytene chromosome region 23E. Genes named for small-nuclear RNAs have similar names, i.e., snRNA:n:m, where n is the type of snRNA and m signifies a cytogenetic map position, with a distinguishing final letter should more than one similar class of snRNA gene map to the same polytene chromosome lettered subdivision; e.g., snRNA:U6:96Aa, snRNA:U6:96Ab. By historical convention the gene encoding the major ribosomal RNAs is called bobbed (bb). Annotated RNA genes will also have CR prefix synonyms (see 1.1.3.). 1.1.3. Gene and Annotation IDs. Whole genome gene model annotation sets for the 12 sequenced species of Drosophila are represented in a common way: a species-specific 2 letter prefix followed by a four or five digit integer (ABnnnn or ABnnnnn). For historical reasons, there are two 2-letter prefixes for D. melanogaster: CG for protein-coding genes and CR for RNA-genes. For all other species, there is a single two-letter code to be used for gene models, regardless of which class of gene they identify. The two letter codes for the 12 species are:
Prefix
Species
CG, CR
Drosophila melanogaster
GA
Drosophila pseudoobscura pseudoobscura
GD
Drosophila simulans
GE
Drosophila yakuba
GF
Drosophila ananassae
GG
Drosophila erecta
GH
Drosophila grimshawi
GI
Drosophila mojavensis
GJ
Drosophila virilis
GK
Drosophila willistoni
GL
Drosophila persimilis
GM
Drosophila sechellia
In the absence of other information, the ABnnnn or ABnnnnn annotation identifier is also used as the placeholder for the gene symbol, if and until a meaningful symbol for that gene is published. 1.1.4. Other Genome Project Gene Identifiers. FlyBase includes genes identified solely by EST or STS sequences. These are named with a prefix to indicate the project (B for BDGP, E for European, N for NIDDK/NIH), either EST: or STS: (to indicate a cDNA or genomic sequence tag), and a clone name and a suffix to indicate from which end of the clone the sequence was determined (T for T7 promoter, S for SP6 promoter). For example, ESTS:4C4T is a gene named for a European STS determined from the T7 promoter of cosmid 4C4. BEST:CK00068 is an example of a gene named for a BDGP EST cluster and NEST:bs05e12 for an NIDDK EST cluster. 1.2. Drosophila prefix. A prefix to indicate that the gene is from Drosophila, e.g. D, Dm, Dmel or Dro is redundant and is, therefore, not used. If it is necessary to draw a distinction between a melanogaster gene and that of another organism which would otherwise have the same symbol, Dmel\ should be used as the preferred prefix, in line with the principle used to denote species other than melanogaster within FlyBase. 1.2.1. Genes from species other than D. melanogaster. FlyBase includes genes from all species of Drosophilidae plus genes from other families that have been introduced into Drosophila (see section 3.2.2). For species other than Drosophila melanogaster, the valid gene symbol follows a species abbreviation indicating the species of origin. The prefix has the form Nnnn\, where N is the initial letter of the genus (i.e., D for Drosophila or Dettopsomyia) and nnn is a unique code, usually the first three letters of the species name (e.g., sim for D. simulans). A list of valid species abbreviations is available on FlyBase. The valid gene symbols for other Drosophila species, wherever possible, should be identical to their Drosophila melanogaster homologues. Exceptions to this recommendation include cases where the D. melanogaster name incorporates a polytene chromosome location and hence is only of relevance to melanogaster, and cases where the symbol in the other species has already been used to refer to a different gene. Outside of the family Drosophilidae, the valid gene symbol for the species of origin of the gene should be used and should respect the capitalization rules for the wild-type gene of that species. All gene symbols should be italicized, regardless of the nomenclature rules for the species of origin. 1.3. Common prefixes. One of a number of common prefixes may be used in the names of genes that fall into one of the following classes (where n designates the chromosome, m a distinguishing symbol and a a gene whose phenotype is modified by an enhancer or suppressor):
Class
Common prefixes used in symbols
enhancer
e(a)m, E(a)m
female sterile
fs(n)m, Fs(n)m
lethal
l(n)m
male sterile
ms(n)m, Ms(n)m
male/female sterile
mfs(n)m, Mfs(n)m
maternal
mat(n)m, Mat(n)m
meiotic
mei
Minute
M(n)m
mitotic mutant
mit(n)m, Mit(n)m
mutagen sensitive
mus
'Polygene'
PL(n)m
resistance
rst(n)m, Rst(n)m
suppressor
su(a)m, Su(a)m
'tumor'
tu(n)m (i.e., genes controlling production of melanotic pseudotumors)
1.4.1. For lethal mutations, if a specific phenotype or a specific gene product can be associated with a lethal locus, then that phenotype or product should be used for the name of the gene; this is not prefixed by lethal, l(n). Otherwise, the general term 'lethal' (l(n)m) is applied, until analysis of the gene allows a more informative name to be assigned. 1.4.2. Lethals not named for a specific phenotype are named according to the lettered subdivision of the polytene map that they occupy. Separate lethal loci within the same subdivision are differentiated by lower case letters, e.g., l(1)1Aa and l(1)1Ab, etc. for lethals in region 1A. When a lettered subdivision has more that 26 lethal complementation groups, l(1)1Az will be followed by l(1)1Aaa, l(1)1Aab, etc. If no polytene mapping information is available then the gene is given an arbitrary code, e.g., l(3)SG44. When more information becomes available such genes will be renamed. 1.5. Common series. Genes of similar function may be given names that are only differentiated by a suffix. Preferably, this should be a polytene chromosome position, e.g. Actin-5C, Actin-42A, Actin-57B etc. Lower case letters are to be used to distinguish different genes mapping within the same chromosome subdivision that encode similar proteins (e.g., nicotinic Acetylcholine receptor alpha 96Aa, nicotinic Acetylcholine receptor alpha 96Ab). A similar system is employed for Minutes, except that, for historical reasons, their names include in a chromosome designator in parentheses, e.g., M(3)62A. 1.6. X vs. 1 for the X chromosome. The symbols X and 1 for the X chromosome are synonymous. The symbol 1 is preferred in formal description of genes, aberrations and their symbols. 1.7. Symbols. A symbol is assigned to each gene. This symbol is an abbreviation of the name that uniquely designates the gene in question; it combines brevity with information. 1.7.1. A symbol must be unique. A symbol previously used for a gene, but now considered to be a synonym, should not be re-used for a new gene (see section 9. Valid Symbols & Synonyms). 1.7.2. Symbols should not contain spaces, superscripts or subscripts. They should only contain characters from the following set: a-z A-Z 0-9 : - ( ) The : character is only used in special contexts (e.g., in the symbols of genes named after RNAs, in mitochondrial gene symbols and, as ::, in the symbols of protein fusion genes). The ( and ) characters are only used in compound symbols, e.g., where they bracket a chromosome designation. The use of Greek, or other non-roman letters, is discouraged. The character / is reserved for separating homologues in genotypes and is not supported by FlyBase as a component of any symbol. An exception to the use of superscripts in symbols is when an allele name is an integral part of a gene name, e.g., su(wa). 1.7.3. When a gene name has a suffix, e.g., Actin-5C, then the same suffix is added to the symbol, Act5C. Hyphens are not used in symbols, except to separate numbers or letters which, if strung together, would lose their descriptive content. 1.7.4. Mitochondrial genes. Genes encoded by the mitochondrial DNA should all have the prefix mt:. For example, the gene encoding subunit 4 of the mitochondrial NADH dehydrogenase has the symbol mt:ND4, that encoding the mitochondrial leucine tRNA with the UUR anticodon is mt:tRNA:L:UUR. The symbol MT:DNA is used to represent the entire mitochondrial genome. 2. Allele names and symbols2.1. Superscripts. Alleles at a particular gene are designated by the same name and symbol and are differentiated by distinguishing superscripts. In written text the allele designation may be separated from that of the gene by a hyphen, e.g., white-apricot. 2.2. Symbols. Allele symbols should be short, preferably no more than three characters long, and cannot contain spaces, superscripts, or subscripts. Whenever possible superscript characters should be limited to the following set: a-z A-Z 0-9 - + : . The + symbol is reserved for the wild-type allele. Consecutive allele numbers should be used wherever possible. Greek characters may be used but are discouraged. The character \ is reserved in all gene symbol contexts for species identification. The character / is reserved as a homologue separator in genotypes and cannot be used in allele symbols. In text in which superscripting is not possible, such as ASCII files, superscripted text should be enclosed between the characters [ and ]. FlyBase makes exceptions to the brevity rule when recording in vitro mutagenesis constructs that are represented with alleles. Where these are not otherwise named FlyBase confers symbols according to a system including the initial of the last name of the first author of the first paper in which the allele was initially reported ('I' in the following examples). The most frequently used classes include:
Symbol
Meaning
cIa
for 'construct a of Author-lastname'
Scer\UAS.cIa
for 'S. cerevisiae UAS construct a of Author-lastname'
tIa
for 'transgene a of Author-lastname'
mIa
for 'minigene a of Author-lastname'
hs.PI
for 'heat shock construct of Author-lastname'
gene_symbol.PI
for 'gene promoter fusion of Author-lastname'
In addition, exceptions have been required for some large series of alleles and collections of mutations. Nevertheless, brevity of allele symbols is very much to be encouraged. 2.2.1 It is unacceptable to use, as a superscripted allele symbol, elements of the genotype in which the allele arose, since such a designation implies something more than a trivial connection between allele and element. Alleles that are revertants of a pre-existing allele are an exception to this rule. 2.2.2. While historically, the numeral 1 has been the implied superscript of nonsuperscripted symbols, this practice has created considerable ambiguity and is now discouraged. As with all other alleles, the numeral 1 should be explicitly designated (e.g., sc1, not sc). 2.2.3. For a recessive allele of a gene named as a dominant, or a dominant allele of a gene named as a recessive, the superscripts r and D, respectively, may be used; e.g., Hnr, Hnr2, and ciD. 2.2.4. For a wild-type allele, a superscripted plus character may be used; e.g., b+ or B+. The plus symbol alone implies the normal (wild-type) allele or alleles in any context, such as y1/+. It may be necessary to distinguish among more than one 'wild-type' allele. In such cases the different wild-type alleles should be given a distinguishing number, which would follow the + character in the superscript, e.g., ry+3. 2.2.5. Absence of a particular locus may informally be noted by use of a superscript minus character with the symbol; e.g., bb-. This is not acceptable as a designation of a particular allele.
2.2.6. Revertants or partial revertants of mutant alleles are designated by the superscript rv followed by a distinguishing number; these are placed after the allele designator, e.g., D4rv32, the 32nd revertant of D4. Revertants of dominant mutations that are deficiencies are treated not as alleles but as deficiencies and are accordingly not superscripted but listed with the distinguishing number, e.g., Df(2L)Scorv4. 2.2.7. Alleles specifying the absence of a particular enzyme or other protein are designated by the superscript n (null) followed by a distinguishing number or letter, e.g., Adhn1, or, where lack of function is inviable, by l (lethal), followed by a distinguishing number, e.g., Nrgl2. 2.2.8. An allele known to be mutant but whose specific identity is unknown is given an asterisk as an allele designation, e.g., w*. 3. Transposons and Transgene ConstructsTransposons or transgene constructs integrated into the Drosophila genome, if they cause a mutant phenotype, are both alleles and aberrations (similar to other classes of aberrations that are associated with mutant phenotypes). Where such insertions produce no mutant phenotype, they are named purely according to aberration conventions. Where transposon/transgene insertions produce a mutant phenotype by disrupting an endogenous gene, they are given names both as an allele of the mutated endogenous gene and as an aberration. The name of the allele follows conventions outlined in section 2. Rules for naming natural transposons and transgene constructs and their insertion into the genome follow. Generic naturally occurring transposons are symbolized as ends{}, where ends stands for the symbol of a given transposon, such as P for P-element. Doc{}, copia{} and P{} are examples. A defined natural variant of the transposon family can be named by including a symbol for that name inside the brackets. A specific insertion of a given transposon is described by including an additional unique symbol following the brackets. Insertions of natural transposons annotated as genome sequence features also have synonyms of the form TEnnnnn, for example, copia{}910 has the synonym TE20021. Symbols for constructed transposons, or transgene constructs, must always include a construct symbol, which defines a particular construct. A full transgene construct genotype consists of the source of transposon ends, included genes, construct symbol, and insertion identifier, in the form ends{genes=construct-symbol}. Once defined, ends{construct-symbol} (or less formally, construct-symbol alone) can be used in most circumstances to refer to a specific transgene construct. The symbol for a specific insertion of a given transgene construct has the form ends{construct-symbol}insertion-identifier. Further details are given in the sections that follow. Some examples:
This nomenclature is formally similar to that used for aberrations, where the ends{symbol} prefix is similar to the Df(n), Dp(n;m), etc., prefixes of aberrations, and the identifier suffix is similar to the gene-allele suffix of aberrations with associated alleles, or the alphanumeric string suffix of other aberrations. Specific rules for assembling the components of a transgene construct genotype follow. 3.1. Transposon ends. Pairs of terminal repeats which together form a transposon are symbolized by opposing braces, {}. The source of the transposon ends is indicated outside the braces, at the left end of the string by a symbol derived from the name of the transposon family: 3.1.1. Isolated terminal repeats are indicated with the family symbol followed by 3' or 5', e.g., P5' represents the isolated 5' end of a P{} transposon. 3.1.2. Multiple sets of matched transposon ends are indicated by nesting ends{} symbols, e.g., P{I{neo[RT]W[+]}}. A P transgene construct containing ry+t7.2 and an isolated hobo terminal repeat from the 5' end of a hobo element would be described as P{ry+t7.2 H5'}. Formally, this system can be extended to any insertion of mobile DNA, for example, the copia, gypsy and FB elements. Thus, the ctMR2 mutation, caused by the insertion of a gypsy element, is called gypsy{}ctMR2. When a mobile element inserts into a mutant gene already carrying a mobile element, it is the new insertion that is named. For example, a jockey insertion into ctMR2 generates ctMRpD, this is called jockey{}ctMRpD. The name describes the new insertion which has caused the new phenotype. A full genotype description, including all sets of transposable element ends, is only provided when the progenitor allele is also fully described. FlyBase uses this nomenclature not only because of its rigor, but also because its more general use may be needed if such elements are engineered. 3.2. Included genes. A full transgene construct description lists within the braces all functional genes, including non-Drosophila genes such as antibiotic resistance genes, bacterial and phage origins of replication, and the FLP1 recombination target (FRT), separated by spaces. The left-right order of these elements reflect their 5' to 3' order (with respect to the transposon ends) within the construct. If the order of a gene is unknown, it is placed at one end of the list, followed or preceded by a comma. 3.2.1. Drosophila melanogaster genes. Valid gene symbols are used to name D. melanogaster genes. Wild-type alleles of intact genes are indicated by a superscripted '+t' followed by an identifier, e.g., ry+t7.2 or Adh+t3.2. A convenient identifier (used in these examples) is the size of the genomic fragment carrying the wild-type gene. Transgene-construct-borne genes that do not confer wild-type function are given unique allele designations without the preceding '+t', e.g., ftzB or yD225. Replacement of promoter or other control sequences can be indicated in the allele designation: dpphs.PP, e.g., for a dpp gene controlled by a heat shock promoter. 3.2.2. Species of origin. Species of origin is indicated for non-melanogaster Drosophila genes present in transgene constructs. A species code composed of the first letter of the genus (capitalized) and a three letter code, usually the first three letters of the species (lower case) is added to the gene symbol with a separating backslash, e.g., Dvir\Dfd+t7.6 for the wild-type Deformed gene from Drosophila virilis (see paragraph 1.2.1). For genes from species other than those of Drosophila the valid gene symbols are used following a four-letter symbol, as above, indicating the species of origin, e.g., Hsap, for humans, Gdom, for chicken, Hsim, for Herpes simplex, Ecol for E. coli etc. For viruses, the name or abbreviation, e.g., Abelson, Adeno5, Cmeg, or symbolic name, e.g., T4, M13, the greek symbol lambda, is sometimes used instead of a genus-species-derived four-letter symbol. In all cases, these symbols are separated from the gene symbol by a backslash \. A file of these species abbreviations is available on FlyBase. FlyBase considers transposable elements, the mitochondrial DNA and other similar entities to be species (this is because each can contain several different genes). It is for this reason that, for example, the P-element Transposase has the symbol P\T in constructs. 3.2.3. Fusion genes. Fusion genes are defined (by FlyBase) as the fusion of protein coding regions of distinct genes constructed by in vitro mutagenesis. They are named using the gene symbols of their component parts, separated by a double colon, e.g., Antp::Scr or Act88F::Scer\act1 . The order of gene symbols stated in the fusion gene will be alphabetical. The complexity of these constructs is such that were each to be named according to its molecular composition, for example in the 5' to 3' direction, the number of named fusion genes would rapidly become impractical. An exception to the 'alphabetical order' rule will be made for cases where the fusion is between a D. melanogaster and a non- melanogaster gene. In such cases the melanogaster gene symbol will be stated first, e.g., tra2::Hsap\SFRS2. For historic reasons, some promoter fusions involving reporter genes such as Ecol\lacZ, though technically protein fusions, are simply treated as alleles of Ecol\lacZ. The symbol for the additional gene(s) contributing to the fusion indicated as part of a superscript, e.g., Ecol\lacZP\T.A92. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name. 3.2.4. Modified genes. Modified genes, cDNAs and in vitro mutagenized sequences are treated as alleles, and will be curated by FlyBase as such. They should be named, therefore, by the same conventions used to name classical alleles. The following allele symbols have been assigned by FlyBase to the commonly used modified genes of D. melanogaster:
Genes modified by the addition of a tag allowing the product to be identified, marked or purified represents a special class of modified genes. Tags are used to mark a transcript, e.g., with a piece of M13 DNA allowing the transcript to be identified by in situ hybridization. Tags are also be used to mark a protein, for purposes of purification (e.g., (His)6), for purposes of identification (epitope tags) or for purposes of targeting to a cellular compartment (nls tags). FlyBase considers as tags constructs designed for these purposes and curates these modified genes as alleles of the tagged gene. Tagged genes have symbols with the format 'T:y' where T stands for Tag and y is the species\gene symbol of the tag, e.g., T:Hsap\Myc, T:Ivir\HA1, T:Hsap\p53, T:Zzzz\His6 (the Zzzz 'species' prefix is used when the tag is artificial). A complete list of tagged gene symbols and their definitions is available from FlyBase through the Genes query form. Change the 'Species' option from the default 'Dmel' to 'All'. Type 'T:*' (don't use the quotation marks) in the 'Symbol/synonym (case insensitive)' field and submit the query. 3.3. Construct symbol. Every construct must be assigned a symbol which, in conjunction with the description of the terminal repeats, uniquely describes a transgene construct, for example, P{lacW}, H{PDelta2-3}. Symbols must be unique, but should be kept as short as possible. 3.3.1. Full genotype. In the full genotype of a transgene construct, the construct symbol is the final entry within the braces, separated from the final gene symbol by the equal sign, e.g., P{lacZP\T.W w+mC ampR ori=lacW} is the full genotype of P{lacW}. 3.3.2. Short form and partial genotypes. Once defined, a transgene construct can be referred to by either the transgene symbol, e.g., P{lacW} (or, less formally, lacW), or the symbol plus insertion identifier (see below) in most contexts. Additional components can be added as needed for clarity. For example, in stock genotypes it is preferable to include the visible markers, as in P{w+mC=lacW}thj5C8 or P{w+t11.7 ry+t7.2= wA}3-1, to avoid misunderstandings about the expected phenotypes of the flies. 3.4. Insertion identifier. The right-most position of the transgene symbol, outside the outer-most bracket, is reserved for a string that identifies a specific insertion into the genome of the defined construct. There are four cases to consider for naming insertions. 3.4.1. Insertion hits a known gene. When a mutant phenotype associated with a transgene construct insertion is assigned to a known gene, the insertion-induced allele should be named by the normal rules. Since such insertions cause new alleles, the gene-allele description is used as the identifier of the associated insertion (just as with other alleles identified as aberrations). For example, a P{lacW} insertion referred to as l(2)k05007 and then shown to be an allele of CycE becomes P{lacW}CycEk05007. Insertion-induced alleles in stock genotypes should include the aberration name of the construct, i.e., P{lacW}CycEk05007. In most other circumstances the insertion aberration prefix can be dropped and the mutation referred to in the usual way, in this case, CycEk05007. 3.4.2. Insertion defines a new gene. Often insertions cause a phenotype that cannot be associated with any known gene. In that case the insertion defines the first allele of a new gene, which is named by the normal rules, e.g., P{lacW}Trf1. 3.4.3. A mapped insertion with no phenotype. If an insertion has no phenotype but is mapped to the polytene chromosomes, then it is preferable to use the polytene chromosome subdivision to which it maps as its identifier, e.g., P{bw+L}60B. If a similar construct already has this name then that of the new one would be P{bw+L}60B-2 or similar. If the insertion is not mapped then there is no alternative but to give the insertion an arbitrary number or code, e.g., P{A92}A45. This symbol must be unique and as simple as possible using only characters from the set: a-z A-Z 0-9 - 4. Cytogenetic descriptionsBreakpoints should be according to the revised salivary gland chromosome maps published by C. B. and P. N. Bridges (see Lindsley and Zimm, 1992), except for chromosome 4, where the map of Sorsa (Chromosome maps of Drosophila Vol. II, CRC Press, 1988) should be used. 4.1. Range designations. For the location of a single object (breakpoint of aberration, gene position, site of transposon insertion, etc.) the range is given as "(d1)(S1)(b1)-(d2)(S2)(b2)", where:
Symbol
Designation
d
=
numbered division (1 to 102)
S
=
lettered subdivision (A to F)
b
=
band number (1 to n, depending upon the particular subdivision)
If the range encompasses two different numbered divisions (i.e., d1 does not equal d2), then the full designations for both the left end and the right end of the range will be used, e.g., 32A3-33A2. If the range is within a single numbered division (i.e., d1=d2) but within different subdivisions (i.e., S1 does not equal S2), then the numbered division designation is not repeated to the right of the hyphen, e.g., 32A3-D4. If the range is within both the same single numbered division and the same lettered subdivision (i.e., d1S1=d2S2), then neither the division nor the subdivision designation will be repeated, e.g., 32A3-5. If a location is known to a single band, then the location will be given as (d1)(S1)(b1) with no hyphen and no repetition of the band location, e.g., 32A3. If a location is known to a single doublet, then the location will be given as (d1)(S1)(b1)-(b1+1) where (b1) and (b1+1) represent the two succeeding bands of the doublet, e.g., 32A1-2. If only one end of a location range is within a doublet, the location will simply refer to the band number maximizing the range, e.g., 32C1-D5 will be used, not 32C1,2-D5 and 32B4-C2 will be used, not 32B4-C1,2. It is sometimes necessary to represent interbands in data curated by FlyBase. Interbands have the same symbol as the immediately preceding band, with the suffix symbol +. The interband between the Bridges' bands 3A4 and 3A5 is, therefore, represented as 3A4+. 4.2. Telomeres. Telomeres are designated by nAt, where n is a chromosome number, A is the chromosome arm, and t indicates the telomere:
Symbol
Meaning
1Lt
=
the telomere of the left arm of X
1Rt
=
the telomere of the right arm of X
YLt
=
the telomere of the long arm of Y
YSt
=
the telomere of the short arm of Y
2Lt
=
the telomere of the left arm of 2
2Rt
=
the telomere of the right arm of 2
3Lt
=
the telomere of the left arm of 3
3Rt
=
the telomere of the right arm of 3
4Lt
=
the telomere of the left arm of 4
4Rt
=
the telomere of the right arm of 4
If the telomere is of unknown origin, use:
?t
=
undefined telomere
4.3. Centromeres and centric heterochromatin. Centromeres are designated as ncen, where n indicates the chromosome, i.e.,1cen, Ycen, 2cen, 3cen and 4cen. 4.3.1. Centric heterochromatic blocks will be indicated as hn, where n is a consecutive number. 4.4. Composite chromosome architecture. The designations of the chromosomes, including polytene band ranges, heterochromatic blocks and centromeres are: YLt h1 -- h17 Ycen h18 -- h25 YSt Note that the centromeres of chromosomes 2 and 3 lie within heterochromatic bands h38 and h53 respectively. Some heterochromatic bands, (h25, h42) are divided into two (h25A, h25B, h42A, h42B) in some stocks. 4.5. Accuracy of cytological descriptions. In designating cytological position, the level of accuracy of the determination should be reflected in the specificity of the statement. Some examples should make these distinctions clear. Note that the polytene subdivision described here, 77B, has 9 bands.
5. Chromosome aberrationsChromosome aberrations have names that consist of a prefix, indicating the class of aberration, an indication of the chromosome, or chromosomes (or their arms) involved contained within parentheses and a specific designation which identifies the particular rearrangement. 5.1. General principles for naming aberrations. 5.1.1. Aberrations not named after a gene: The suffix (i.e., the component of the name following the parentheses) should include only letters and digits. There should be no superscripts or subscripts except for the particular cases of synthetic inversions with L and R superscripts (see 5.4.4). They should not contain spaces. The characters ( and ) are only to be used to enclose the designation of a chromosome or chromosome arm. 5.1.2. Aberrations named after a gene but not associated with an allele: Here the association with the gene carries circumstantial information about the aberration's breakpoints. The suffix should comprise the gene symbol, followed by a hyphen if needed for clarity, followed by any alphanumeric of the investigator's choosing. There should be no superscripts. 5.1.3. If a gene whose symbol appears in an aberration changes its name, e.g., for reasons of newly-discovered allelism, then this name change is propagated to the aberration(s) in question. The old name will become a synonym. 5.1.4. Aberrations named for a specific associated allele: Here the suffix should be exactly the same as the allele designation, i.e. the gene symbol followed by the superscripted allele symbol. If the allele designation (either gene or allele part) changes, that change will be propagated to the aberration. 5.2.1. Translocations have the symbol T(n1;n2...)m, where n1, n2 ... indicate the numbers of the chromosomes involved in the translocation. When chromosomes are listed within the parenthetical information of a translocation symbol they are listed in the order: 1, Y, 2, 3, 4. The numbers of the different chromosomes are separated by semicolons, with no spaces. 5.2.2. The separable components of translocations. Previous conventions for naming such aneuploid segregants have been difficult to employ and do not contain sufficient information in the derivative name to permit automated recognition of the relationship between aneuploid segregant and euploid progenitor. FlyBase will employ the following conventions for different classes of euploid chromosomal aberrations and their aneuploid derivatives. 5.2.2.1. Translocation segregants. Translocations, standardly named T(n1;n2)m, consist of two or more translocated chromosomes, each of which can potentially exist as an aneuploid segregant. Such segregants will be named using telomeres of the rearranged chromosomes as landmarks for specific segregants. Two-break translocations are often called reciprocal translocations if two chromosome segments have simply been exchanged. The general form of the name of a segregant will be Ts(n1Pt;n2Qt)m. Ts stands for 'Translocation segregant"' n1Pt and n2Qt for the designation of the landmark telomere(s) (e.g., 2Lt, 3Rt) and m is the same suffix as the progenitor translocation from which the segregant is derived.
5.2.2.2. Complex segregants and recombinants. For many complex translocations or inversions with four or more breakpoints, multiple aneuploid segregants or recombinants can potentially occur. It is impossible to invent a naming scheme for these complex cases that would automatically reveal the specific aneuploid chromosome complement. In such instances, resulting aneuploids will be given appropriate names as follows: The first duplication or deletion is assigned the unique suffix of the parental euploid rearrangement. The new order of the resulting chromosome must be reported. Succeeding duplications or deletions are assigned other unique suffixes. Their new orders must also be reported. 5.3. Rings. Ring chromosomes have the symbol R(n)m , where n indicates the number of the chromosome and m is a specific designation. 5.4.1. Inversions have the symbol In(nA)m, where n indicates the number of the chromosome involved, A the arm or arms involved and m is a specific designator. In the case of multiple-break intrachromosomal rearrangements, the distinction between inversions and transpositions often becomes ambiguous. An intrachromosomal rearrangement that can be partitioned into a duplicated and a deficient product by exchange with a normal-sequence chromosome is designated a transposition even though it may carry an inverted segment; otherwise, it is designated an inversion. 5.4.2. If it is not known whether or not an inversion is paracentric (does not include the centromere) or pericentric (includes the centromere) then the indicator of chromosome arm(s) is omitted, i.e., In(n)m. 5.4.3. By convention, In(1) implies In(1L). 5.4.4. Recombinant products between two inversions. Recombination between similar inversions may produce viable recombinant inversions with the left end of one and the right end of the other. Superscripts L and R are used to identify the sources of the two ends; for example; In(2L)CyLtR. 5.5. Transpositions. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type. Rearrangments that alter the pairing of telomeres are classified as translocations. In the case of multiple-break intrachromosomal rearrangements, the distinction between inversions and transpositions often becomes ambiguous. An intrachromosomal rearrangement that can be partitioned into a duplicated and a deficient product by exchange with a normal-sequence chromosome is designated a transposition even though it may carry an inverted segment; otherwise, it is designated an inversion. 5.5.1. Transpositions have the symbol Tp(n1;n2)m, where n1 is the 'donor' chromosome, n2 the 'recipient' chromosome and m a specific designation. For intrachromosomal transpositions n1 = n2. 5.5.2. Separable components of transpositions. 5.5.2.1. Interchromosomal transpositions. Segregants of interchromosomal transpositions will continue to be referred to as in the past. For a transposition with the name Tp(n1;n2)m, the chromosome segregant containing the duplicated material will be named Dp(n1;n2)m, and the chromosome containing the deleted material will be named Df(n1A)m, where A refers to the chromosome arm of the deletion.
5.5.2.2. Intrachromosomal transpositions. Segregants here are produced by recombination with a structurally normal chromosome, not by chromosome segregation. For transpositions in which the transposed segment is in the uninverted orientation relative to the standard map, there may be two potential duplication and two potential deletion derivatives (one set resulting from recombination events in the region between the deficiency and duplication components of the transposition, and one set resulting from recombination events within the transposed segment). For transpositions of the type Tp(n1;n1)m, the reported duplication segregant will be named Dp(n1;n1)m and the new order must be reported to eliminate any ambiguity. Similarly, the reported deletion recombinant is referred to as Df(n1A)m, where A refers to the chromosome arm bearing the deletion. In rare cases in which the alternative duplication or deletion recombinant (generated by recombination within the transposed segment) is also reported, it will be given a different suffix from the progenitor transposition and the new order will be reported.
If subsequently, the other deletion or duplication recombinant is generated, it will be given a novel suffix, perhaps completely unrelated to the progenitor, e.g.:
5.6. Deficiencies (deletions). Deficiencies (deletions) have the symbol Df(nA)m, where n is the number of the deleted chromosome, A is the chromosome arm and m is a specific designator. Intragenic deletions are not treated as deficiencies, but as alleles; at least two adjacent loci must be removed or disrupted before a lesion is considered a deletion. Duplications have the symbol Dp(n1;n2)m, where n1 is the 'donor' chromosome, n2 the recipient and m a specific designator; n1 may equal n2. Duplications may be: tandem (in direct or inverted order), insertional or free. Direct and inverted tandem duplications are not distinguished by their symbols. Ambiguity must be avoided by explicit description of the new order (see section 9. Valid Symbols & Synonyms). 5.7.1. When the duplicated sequences are carried as a free centric element, the letter f (free) follows the semicolon within the parentheses, replacing n2; e.g., Dp(1;f)101. 5.7.2. Higher order repeats. Higher-order repeats are also symbolized Dp, with the number of repeats indicated in the parenthetical chromosomal designation, i.e., Dp(1;1) = duplication, Dp(1;1;1) = triplication, and so forth. 5.8. Y derivatives. In the past many Y chromosome derivatives (e.g., marked- Y chromosomes) were named in a rather special way, as m1Ym2 , where m1 is a marker (or markers) carried on YL and m2 a marker (or markers) carried on YS. Such chromosomes should be named as duplications, following the normal rules. Thus a y+Y is Dp(1;Y )y+ and Ymal+ is Dp(1;Y)mal+. 5.9. Autosynaptic elements. A pericentric inversion can be converted to two reciprocal autosynaptic elements by recombination between the inverted segment and a normal homolog. For a pericentric of the type In(nLR)m, the two autosynaptic products are LS(n)m and DS(n)m, where LS refers to the product carrying the two left (L = levo) telomeres and DS to that carrying the two right (D = dextro) telomeres. Chromosome elements of very similar structures to autosynaptic elements can be recovered by other means; by convention, these are also called autosynaptic elements if autosynaptic elements were used in their recovery. 5.9.1. In stocks, autosynaptic elements must be carried as balanced pairs; their symbols are then separated by a double slash thus, LS(n)m1//DS(n)m2. In the special case where the two members of such a balanced pair are reciprocal recombinant products (e.g., LS(n)m1//DS(n)m1) then such a genotype can be called AS(n)m1. Compound chromosomes may be subdivided into two classes, homocompounds, consisting of two copies of the same chromosomal arm attached to a common centromere, and heterocompounds in which two arms from different chromosomes are connected through the centromere of one of them. They are designated by the symbol C followed parenthetically by the designation of the involved chromosome arm or arms. In stock genotypes, the linkage relationship of markers on compound chromosomes is indicated with a colon, e.g., C(4)RM-P2, ci1 eyR: gvl1 svn. 5.10.1. Homocompounds. Homocompound chromosomes are classified according to relative orientation of their arms (i.e., tandem, reversed or ring) and the position of their centromeres (i.e., acrocentric or metacentric): reversed acrocentrics (C(n)RA), reversed metacentrics (C(n)RM), reversed rings (C(n)RR), tandem acrocentrics (C(n)TA), tandem metacentrics (C(n)TM), and tandem rings (C(n)TR), where n is a the number of a chromosome or chromosome arm. In each case the symbol is followed by a specific designator, separated by a hyphen. 5.10.1.1. When the component arms differ in sequence by something other than whole-arm inversion, the tandem or reversed classification becomes ambiguous. Furthermore, when the component arms are separable from each other by a single break, the terms acrocentric and metacentric are descriptive; however, when elements of the two arms become interspersed (as for example by interarm rearrangements), these terms lose meaning. Consequently, the more-complex compounds are given arbitrary symbols. 5.11. Heterocompounds. Heterocompound chromosomes have the symbol C followed by the chromosome or arms involved within parentheses, e.g., C(1;Y), C(2L;3R). The chromosomal origin of the centromere in such compounds is frequently ambiguous. It is usually necessary to describe the structure of any given heterocompound in some more detail, by its new order. The distinction between some heterocompound chromosomes and whole-arm translocations can be moot. The term 'free' is used with respect to the left and right arms of the major autosomes, and to the long and short arms of the Y chromosome, when an arm exists as an individual chromosome element. The symbol for a free arm is: F(nA)m, where n = Y, 2 | |